Compositional data and Simpson’s paradox

نویسندگان

  • V. Pawlowsky-Glahn
  • J. J. Egozcue
چکیده

Simpson’s paradox, also known as amalgamation or aggregation paradox, appears when dealing with proportions. Proportions are by construction parts of a whole, which can be interpreted as compositions assuming they only carry relative information. The Aitchison inner product space structure of the simplex, the sample space of compositions, explains the appearance of the paradox, given that amalgamation is a non-linear operation within that structure. Here we propose to use balances, which are specific elements of this structure, to analyse situations where the paradox might appear. With the proposed approach we obtain that the centre of the tables analysed is a natural way to compare them, which avoids by construction the possibility of a paradox.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational Social Scientist Beware: Simpson's Paradox in Behavioral Data

Observational data about human behavior is often heterogeneous, i.e., generated by subgroups within the population under study that vary in size and behavior. Heterogeneity predisposes analysis to Simpson’s paradox, whereby the trends observed in data that has been aggregated over the entire population may be substantially different from those of the underlying subgroups. I illustrate Simpson’s...

متن کامل

Simpson’s Paradox in the interpretation of “leaky pipeline” data

The traditional ‘leaky pipeline’ plots are widely used to inform gender equality policy and practice. Herein, we demonstrate how a statistical phenomenon known as Simpson’s paradox can obscure trends in gender ‘leaky pipeline’ plots. Our approach has been to use Excel spreadsheets to generate hypothetical ‘leaky pipeline’ plots of gender inequality within an organisation. The principal factors,...

متن کامل

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining

This paper proposes to integrate two very different kinds of methods for data mining, namely the construction of Bayesian networks from data and the detection of occurrences of Simpson’s paradox. The former aims at discovering potentially causal knowledge in the data, whilst the latter aims at detecting surprising patterns in the data. By integrating these two kinds of methods we can hopefully ...

متن کامل

Simpson’s Paradox – A Survey of Past, Present and Future Research

Simpson’s paradox refers to the reversal of a statistical relationship between two variables in sub-populations when the sub-populations are combined and analyzed as a population. This article is intended to provide a broad survey of the past, present and future research surrounding the issue. Real data from a discrimination litigation case is examined to identify the occurrence of the paradox....

متن کامل

How Likely is Simpson’s Paradox?

What proportion of all 2× 2× 2 contingency tables exhibit Simpson’s Paradox? An approximate answer is obtained for large sample sizes and extended to 2×2×l tables. Several conditional probabilities of the occurrence of Simpson’s Paradox are also derived. Given that the observed cell frequencies satisfy a Simpson reversal, the posterior probability that the population parameters satisfy the same...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008